FIFA 19 is a football (soccer) simulation video game developed by EAsports. It is a part of the FIFA series, which has been produced for over 20 years. Every year, a FIFA game is released, FIFA 2019 was released in 2018, at the beginning of the 2018-2019 season of major soccer leagues in Europe. FIFA 2019 has over 31 leagues and more than 720 playable teams from around the world. This game contains an enormous amount of data which demonstrates different ratings and information of players, ranging from age and nationality to skillsets such as finishing, kicking, heading, tackling and even weak-foot strength.
How do they complete such a huge database of ratings for every single player from all the licensed leagues? EA Sports employs a team of 25 EA Producers and 400 outside data contributors, who are led by the Head of Data Collection & Licensing. This team is responsible for ensuring all player data is up to date, while a community of over 6,000 FIFA Data Reviewers or Talent Scouts from all over the world are constantly providing suggestions and alterations to the database.
In this project, our team will try to catch hold of several insights from the dataset using EDA and other statistical analysis methods.
Sources https://www.ea.com/games/fifa https://www.fifplay.com/fifa-19-leagues-and-teams/ https://www.goal.com/en-ae/news/fifa-player-ratings-explained-how-are-the-card-number-stats/1hszd2fgr7wgf1n2b2yjdpgynu
In this study, the CSV data file comes from FIFA 2019 database (https://www.kaggle.com/karangadiya/fifa19). This dataset contains 18,207 soccer player information in FIFA 2019 with 89 variables such as name, age, nationality, skill level, potential, club, transferred value, wage, preferred foot, body type, position in the field, height, and weight.
To do a basic data visualization, latitude and longitude in the dataset are used to draw the map, which is plotted by leaflet library in R.
First of all, we analyzed the differences in the value of players based on the positions they play and found that strikers valued more than midfielders and defenders.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Transformation introduced infinite values in continuous y-axis
## Warning: Removed 252 rows containing non-finite values (stat_boxplot).
## Warning: Transformation introduced infinite values in continuous y-axis
## Warning: Removed 204 rows containing non-finite values (stat_boxplot).
Because strikers are the most valuable players, we focus on analyzing what skills made them so valuable.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
As can be seen from the above correlation plot, many of the skills are correlated. Therefore, to simply the data and give us a clearer understanding of what skills made strikers so valuable, we further conducted a K-means clustering and a PCA test.
##
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
##
## combine
## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa
Four clusters are generated from the skills. ecause there are too many skills, here we only select 6 skills to show their loading in each cluster.
## Registered S3 method overwritten by 'GGally':
## method from
## +.gg ggplot2
##
## Attaching package: 'GGally'
## The following object is masked from 'package:dplyr':
##
## nasa
## **Results for the Principal Component Analysis (PCA)**
## The analysis was performed on 3418 individuals, described by 29 variables
## *The results are available in the following objects:
##
## name description
## 1 "$eig" "eigenvalues"
## 2 "$var" "results for the variables"
## 3 "$var$coord" "coord. for the variables"
## 4 "$var$cor" "correlations variables - dimensions"
## 5 "$var$cos2" "cos2 for the variables"
## 6 "$var$contrib" "contributions of the variables"
## 7 "$ind" "results for the individuals"
## 8 "$ind$coord" "coord. for the individuals"
## 9 "$ind$cos2" "cos2 for the individuals"
## 10 "$ind$contrib" "contributions of the individuals"
## 11 "$call" "summary statistics"
## 12 "$call$centre" "mean of the variables"
## 13 "$call$ecart.type" "standard error of the variables"
## 14 "$call$row.w" "weights for the individuals"
## 15 "$call$col.w" "weights for the variables"
The scree plot suggests that four principle components might be optimal for the PCA test.
We also checked the contribution of each skill to each of the four principle components.
To begin the analysis, we drop all character variables and delete missing values. Additionally we drop the ID variable as it provides no usable information for our analysis.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 5.0 39.0 49.0 48.5 60.0 92.0
## [1] 15.7
## Penalties Aggression Age Stamina ShotPower Balance
## Penalties 1.0000 0.336 0.1394 0.5164 0.795 0.483
## Aggression 0.3364 1.000 0.2652 0.6460 0.492 0.185
## Age 0.1394 0.265 1.0000 0.0979 0.157 -0.090
## Stamina 0.5164 0.646 0.0979 1.0000 0.616 0.475
## ShotPower 0.7952 0.492 0.1570 0.6164 1.000 0.459
## Balance 0.4828 0.185 -0.0900 0.4749 0.459 1.000
## BallControl 0.7699 0.550 0.0851 0.7286 0.831 0.601
## LongPassing 0.5427 0.591 0.1817 0.6358 0.672 0.462
## HeadingAccuracy 0.5520 0.693 0.1472 0.6346 0.612 0.169
## Strength 0.0545 0.474 0.3333 0.2628 0.169 -0.391
## BallControl LongPassing HeadingAccuracy Strength
## Penalties 0.7699 0.543 0.552 0.0545
## Aggression 0.5500 0.591 0.693 0.4739
## Age 0.0851 0.182 0.147 0.3333
## Stamina 0.7286 0.636 0.635 0.2628
## ShotPower 0.8314 0.672 0.612 0.1692
## Balance 0.6009 0.462 0.169 -0.3908
## BallControl 1.0000 0.789 0.658 0.0878
## LongPassing 0.7887 1.000 0.511 0.1143
## HeadingAccuracy 0.6582 0.511 1.000 0.4869
## Strength 0.0878 0.114 0.487 1.0000
The EDA process begins by subsetting the data set to include only variables of interest. We maintain Penalties are our dependent variable and include 9 explanatory variables. The summary statistics for Penalties indicate that players’ penalties skill range from a low of 5.0 to a high of 92.0 with a mean of 49.0 and a stardard deviation of 15.7. The summary output of the correlation matrix suggest that most variables are correlated in some way to other variables. The graphical display of the correlation matrix visually confirms the correlations. The data may be appropriate for linear regression, but the high correlation associated with some variables may suggest multicollinearity.
##
## Call:
## lm(formula = Penalties ~ ., data = fifa)
##
## Residuals:
## Min 1Q Median 3Q Max
## -39.00 -5.15 0.31 5.50 47.81
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.76672 0.64153 1.20 0.23
## Aggression -0.16083 0.00599 -26.87 < 2e-16 ***
## Age 0.32554 0.01471 22.13 < 2e-16 ***
## Stamina -0.05354 0.00671 -7.98 1.6e-15 ***
## ShotPower 0.44502 0.00671 66.36 < 2e-16 ***
## Balance 0.08893 0.00705 12.62 < 2e-16 ***
## BallControl 0.39634 0.01003 39.51 < 2e-16 ***
## LongPassing -0.12904 0.00722 -17.87 < 2e-16 ***
## HeadingAccuracy 0.17451 0.00645 27.04 < 2e-16 ***
## Strength -0.05908 0.00761 -7.76 8.8e-15 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.42 on 18137 degrees of freedom
## Multiple R-squared: 0.713, Adjusted R-squared: 0.713
## F-statistic: 5e+03 on 9 and 18137 DF, p-value: <2e-16
## Aggression Age Stamina ShotPower
## 2.77 1.21 2.92 3.42
## Balance BallControl LongPassing HeadingAccuracy
## 2.54 7.17 3.14 3.22
## Strength
## 2.34
We build a multiple linear regression model where we regress Penalties on Aggression, Age, Stamina, ShotPower, Balance, BallControl, LongPassing, HeadingAccuracy and Strength. The coefficient on the intercept suggests that the mean skill value for players is 0.76, but the coefficient is not statistically significant at any level. However, setting the explanatory variables to zero and interpreting the intercept does not make sense. Age is an explanatory variable, and a zero value of age is not realistic. Additionally, the explanatory variables in the data set do not take on values of zero. We suggest the insignificance of the intercept is OK.
The coefficients on all explanatory variables are statistically significant at the zero percent level, and suggest that all variables in the regression contribute to predicting a player’s penalty skills. Age, Ball Control and Heading Accuracy are the top three most postive contributors to a player’s penalty skills. Aggression, Long Passing and Strength are the top three most negative contributors to a player’s penalty skills. Skills in penalty shots seem to be influenced positvely by experiece and finesse rather than brute strength.
The VIF values for the explanatory variables suggest that there is no multicollinearity among the variables, but the variable BallControl is close to the multicollinearity threshhold of 10 with a value of 7.17. The multiple regression model displays an R-squared value of 0.713, and suggests that the model explains 71 percent of the variance in the dependent variable.
The histogram of the regression residuals suggest that the residuals are close to normally distributed. The multiple regression model is a good model that can accurately predict a player’s skill in penalty shots.
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## Standard deviation 2.274 1.360 0.9360 0.8211 0.6344 0.5581 0.4845
## Proportion of Variance 0.517 0.185 0.0876 0.0674 0.0402 0.0312 0.0235
## Cumulative Proportion 0.517 0.702 0.7896 0.8571 0.8973 0.9284 0.9519
## PC8 PC9 PC10
## Standard deviation 0.4611 0.4108 0.31511
## Proportion of Variance 0.0213 0.0169 0.00993
## Cumulative Proportion 0.9732 0.9901 1.00000
Although the multiple linear regression model performs well, we investigate the use of PCA/PCR to attempt to reduce the demensions of the model. The output of the scaled data displays the principal components and their summary statistics. 4 principal components are enough to explain 85 percent of the variation in the data set while 6 principal components are enough to explain 93 percent. The PCA graph displays a graphical representation of the proportion of variance explained by each principal component.
## Data: X dimension: 14517 9
## Y dimension: 14517 1
## Fit method: svdpc
## Number of components considered: 9
##
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
## (Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps
## CV 15.7 11.15 10.67 10.57 8.927 8.542 8.502
## adjCV 15.7 11.15 10.67 10.57 8.926 8.541 8.502
## 7 comps 8 comps 9 comps
## CV 8.503 8.49 8.456
## adjCV 8.503 8.49 8.455
##
## TRAINING: % variance explained
## 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps
## X 51.30 71.26 80.84 86.18 90.50 93.92 96.54
## Penalties 49.61 53.86 54.68 67.70 70.42 70.70 70.70
## 8 comps 9 comps
## X 98.88 100.00
## Penalties 70.80 71.03
## [1] "---------MSE Linear Regression---------"
## [1] 70.8
## [1] "---------MSE PCR ---------"
## [1] 69.9
## Data: X dimension: 18147 9
## Y dimension: 18147 1
## Fit method: svdpc
## Number of components considered: 6
## TRAINING: % variance explained
## 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps
## X 51.28 71.28 80.84 86.19 90.50 93.93
## Penalties 49.87 54.19 54.92 67.64 70.59 70.90
We decide to build a PCA model and perform PCR. Ultimately, we would like to compare the MSE from the multiple linear regresson model to the MSE of the PCR model, and choose the model with the lower MSE. We split the data into training and testing data sets using and 80/20 split. We then fit the PCR model on the training dataset using 10-fold cross validation. The output from the PCR fit of the training data set suggests that 6 principal components achieve the lowest RMSEP value at 8.502. The validation plot of the PCR graphically displays the RMSEP values of each principal component. We then compare the MSE from the linear regression model to the MSE of the PCR model. The MSE of the linear regression is higher (70.8) than the MSE of PCR (69.9). The PCR with 6 principal components performs well and displays a lower MSE then linear regression; therefore, we build a PCR model with 6 principal components and suggest that PCR is the better model.
#import data
Goalkeepers’ potential ratings are normally distributed, the majority of ratings lies in the 60-70 range.
The scatterplot of current rating and age shows a general upward trend, older goalkeepers tend to have greater ratings.
The scatterplot of age and potential rating shows a generaldownward trend, younger goalkeepers generally have higher potential.
##
## Call:
## lm(formula = Potential ~ Age + Height + Weight + Reactions +
## Jumping + GKDiving + GKHandling + GKKicking + GKPositioning +
## GKReflexes, data = gk)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.919 -1.771 -0.393 1.464 11.899
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 33.91229 2.51635 13.48 < 2e-16 ***
## Age -0.76035 0.01378 -55.20 < 2e-16 ***
## Height 0.01137 0.01495 0.76 0.44701
## Weight -0.01951 0.00530 -3.68 0.00024 ***
## Reactions 0.06030 0.01015 5.94 3.4e-09 ***
## Jumping 0.01086 0.00603 1.80 0.07181 .
## GKDiving 0.19153 0.01917 9.99 < 2e-16 ***
## GKHandling 0.24788 0.01670 14.84 < 2e-16 ***
## GKKicking 0.03788 0.01171 3.24 0.00123 **
## GKPositioning 0.20538 0.01632 12.59 < 2e-16 ***
## GKReflexes 0.14414 0.01849 7.80 1.0e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.61 on 2014 degrees of freedom
## Multiple R-squared: 0.833, Adjusted R-squared: 0.832
## F-statistic: 1e+03 on 10 and 2014 DF, p-value: <2e-16
##
## Call:
## lm(formula = Potential ~ Age + Height + Weight + Reactions +
## Jumping + GKDiving + GKHandling + GKKicking + GKPositioning +
## GKReflexes, data = gk)
##
## Coefficients:
## (Intercept) Age Height Weight Reactions
## 33.9123 -0.7604 0.0114 -0.0195 0.0603
## Jumping GKDiving GKHandling GKKicking GKPositioning
## 0.0109 0.1915 0.2479 0.0379 0.2054
## GKReflexes
## 0.1441
Residuals vs Fitted: This plot shows if residuals have non-linear patterns.
Equally spread residuals around a horizontal line without distinct patterns -> does not show non-linear relationships -> good.
Normal Q-Q: This plot shows if residuals are normally distributed.
Residuals line quite well on the straight line -> normally distributed.
Scale-Location: This plot shows if residuals are spread equally along the ranges of predictors -> check the assumption of equal variance (homoscedasticity).
Horizontal line has equally (randomly) spread points -> good
Residuals vs Leverage: This plot helps us find influential outliers.
Outliers are not influential.
## Response variable: Potential
## Total response variance: 40.6
## Analysis based on 2025 observations
##
## 10 Regressors:
## Age Height Weight Reactions Jumping GKDiving GKHandling GKKicking GKPositioning GKReflexes
## Proportion of variance explained by model: 83.3%
## Metrics are normalized to sum to 100% (rela=TRUE).
##
## Relative importance metrics:
##
## lmg
## Age 0.26854
## Height 0.00657
## Weight 0.00745
## Reactions 0.07016
## Jumping 0.01615
## GKDiving 0.15499
## GKHandling 0.13791
## GKKicking 0.06897
## GKPositioning 0.11298
## GKReflexes 0.15628
##
## Average coefficients for different model sizes:
##
## 1X 2Xs 3Xs 4Xs 5Xs 6Xs 7Xs
## Age -0.1528 -0.4838 -0.6293 -0.6906 -0.7186 -0.73365 -0.74342
## Height 0.1777 0.1210 0.0963 0.0806 0.0693 0.05935 0.04900
## Weight 0.0385 -0.0110 -0.0329 -0.0400 -0.0409 -0.03888 -0.03536
## Reactions 0.3299 0.1970 0.1156 0.0696 0.0468 0.03829 0.03827
## Jumping 0.1530 0.0706 0.0314 0.0141 0.0075 0.00568 0.00601
## GKDiving 0.5835 0.5400 0.4891 0.4362 0.3846 0.33650 0.29280
## GKHandling 0.5698 0.4880 0.4242 0.3761 0.3407 0.31465 0.29467
## GKKicking 0.4572 0.2972 0.1966 0.1356 0.0992 0.07704 0.06259
## GKPositioning 0.4573 0.3454 0.2695 0.2214 0.1940 0.18155 0.17942
## GKReflexes 0.5552 0.5203 0.4751 0.4249 0.3733 0.32253 0.27378
## 8Xs 9Xs 10Xs
## Age -0.75048 -0.7559 -0.7604
## Height 0.03766 0.0251 0.0114
## Weight -0.03080 -0.0255 -0.0195
## Reactions 0.04314 0.0509 0.0603
## Jumping 0.00735 0.0091 0.0109
## GKDiving 0.25399 0.2202 0.1915
## GKHandling 0.27810 0.2630 0.2479
## GKKicking 0.05229 0.0444 0.0379
## GKPositioning 0.18419 0.1934 0.2054
## GKReflexes 0.22766 0.1844 0.1441
Age is the most important component of the model’s R-squared value, which can be used to explain 83.27% in variance of a goalkeeper’s potential rating. Current abilities in diving and reflexes are also important determinants of the model.
First, we cleaned up and formatted the Wage and Value column by first removing the pound(€) sign. We then wanted to remove the non-numerical characters in the data “M”, “K”, and “.”. To keep the data meaningful, we created a column for M and a column for K for both Wage and Value and copied ththem in their respective columns before removing the characters. The only thing left is numbers and periods. we then multiplied the M column with 1,000,000 and K column with 1,000. After this, We merged them into their respective Wage and Value columns now formatted as numericalv values ready for operations and functions. We cahnged the skill int values to numeric and we also omitted NA values just incase there are some.
#Check to view the structure of the datasets
str(data_wage_skills)
str(data_value_skills)
We transformed the Wage and Value because they were heavily skewed to the right. After the log transformation, the historgam looks a little more normal. For Wage however, although it is better, is still right skewed. We tried using Tukey transformation which is stronger than log() but it was only limited to about 5,000 observations. We have morethan 1 million observations. This is one of our limitations.
To answer the question, we performed multiple regressions for Wage and Value with Skills and have the results shown below. We showed the top 5 skills that explain Wage and Value.
#Perform Regression for Wage vs Skills
wage_lm <- lm(Wage ~ ., data = data_wage_skills)
summary(wage_lm)
Top 5 Skills that Explain Wage: Reactions - 682.6, Composure - 212.4, Ball Control - 209.5, GK Handling - 176.8, GK Diving - 174.0
#Perform Regression for Values vs Skills
value_lm <- lm(Value ~ ., data = data_value_skills)
summary(value_lm)
Top 5 Skills that Explain Value: Reactions - 208049, Short Passing - 56542, Composure - 53780, Ball Control - 52684, GK Diving - 44531
What is interesting is the skills that affect value and wage the most are reactions, composure, ball control and GK diving. all these made top 5 skills that explain wage and value. These skills are the best predictors of wage and value. We can see however that other than these four skills, GK handling seems to explain wage and Short passing seems to explain a Players value.
We analysed the models to see if it holds with the models assumptions. We plotted the residuals and looked at the VIF values.
We reviewed the plots and both Wage and value have similar results. These are our observations:
Residual vs fitted - What we want is a linear pattern and we dont want non linear patterns. It seems like in this case even if it is curved the pattern is quite linear considering we have a huge dataset (its okay)
QQ plot - What we want here is the residual points not deviating so much from the line. In this case, they deviate off the line severely (not good)
Scale location - We want residuals randomly spread throughout the line. In this case residuals not spread randomly spread, and appears that it is concentrated in the left and starts to spread out on the right (not good)
Residual vs Leverage - we want residuals to be inside Cooks Distance or the dashed line. In this case, we cannot see the dashed line. We are not sure if it is inside or outside the dashed line.
#Perforsm VIF for Wage vs Skills Linear Model
vif(wage_lm)
#Perforsm VIF for Value vs Skills Linear Model
vif(value_lm)
It seems like we have multicollinearity since our VIFs are way greater than 5. The independent variables are too correlated with each other and these Values Suggest that they are poorly estimated.
In this section, we perform the exloratory data analysis (EDA) on several numerical variables in the dataset. After assessing some of the characteristics of those variables, we decided to clean the data by omitting outliers and NA values.
In this chapter, we use logistic regression with 7 variables including age, skill level, wage, preferred foot, position, height, and weight.
After transforming some variables, we got the new data. There are 4 numeric variables (age, skill level, height, and weight) and 3 categorical variables (Wage_dummy, position, and preferred foot)
In this step, we need to check normality for 4 numeric variable, including age, skill level, height, and weight.
According to the above Q-Q plots, these four numeric variables are not far from normal distribution. However, the tails in each Q-Q plot indicate that the data have the outliers. Therefore, removing outliers process is required before doing an analysis in the next step.
To study the effects on wage by the independent variable, we can create two-way contingency table of the outcome (wage) and each predictors. We use a chi-squared test to see if the two are independent (or same frequency distribution). According to the results, we found that the “Wage” are dependent with “age, skill level, and position in the field”. However, the rest of independent variables are independent with “Wage”
Let us now turn our attention to logistic regression models. We run the Wage model including age, preferred.Foot, overall skill level, preferred position in the field, weight, and height.
##
## Call:
## glm(formula = Wage_dummy ~ Age + Preferred.Foot + Overall + Position +
## Weight + Height, family = "binomial", data = fifaNew)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.368 -0.012 -0.003 -0.001 3.225
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -59.27247 5.82873 -10.17 <2e-16 ***
## Age -0.05785 0.03387 -1.71 0.088 .
## Preferred.FootRight -0.02402 0.27749 -0.09 0.931
## Overall 0.71730 0.05174 13.86 <2e-16 ***
## PositionDEF 1.42780 0.59407 2.40 0.016 *
## PositionMID 1.35141 0.60210 2.24 0.025 *
## PositionFWD 1.49305 0.62545 2.39 0.017 *
## Weight -0.00570 0.01281 -0.45 0.656
## Height 0.00184 0.03025 0.06 0.952
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1184.30 on 17848 degrees of freedom
## Residual deviance: 506.66 on 17840 degrees of freedom
## AIC: 524.7
##
## Number of Fisher Scoring iterations: 11
According to the above regression results, the cofficient on age indicates that for every one year increases in age, ln(odds-ratio) of wage at higher 100,000 Euro per week level decreases by 0.06. In other word, it means that the young player have more opportunity to earn higher 100k Euro per week than the olders. Furthermore, it is statistically significant.
The cofficient on skill level indicates that for every one level increases in overall skill, ln(odds-ratio) of wage at higher 100,000 Euro per week level increase by 0.72. In other word, it means that the high skill players have more opportunity to earn higher 100k Euro per week than the low skill players. In addition, it is statistically significant.
For position: use Goalkeeper as baseline, ln(odds-ratio) of wage at higher 100,000 Euro per week level increases by 1.42 when changing from Goalkeeper to Defender. And, ln(odds-ratio) of higher 100,000 euro wage increases by 1.35 when changing from Goalkeeper to Midfielder. Finally, ln(odds-ratio) of higher 100,000 euro wage increases by 1.49 when changing from Goalkeeper to Forward. It is likely that the opportunity to earn higher 100k Euro per week will increase if the preferred position of soccer players are far from goalkeeper. In addition, the case is statistically significant.
Whereas, preferred.Foot, and weight, and height are not statistically significant.
Receiver-Operator-Characteristic (ROC) curve and Area-Under-Curve (AUC) measures the true positive rate (or sensitivity) against the false positive rate (or specificity). The area-under-curve is always between 0.5 and 1. Values higher than 0.8 is considered good model fit.
The result is shown here:
We have here the area-under-curve of 0.991, which is higher than 0.8. This indicates the model is really a good fit, and all the coefficients are significant.
McFadden is another evaluation tool we can use on logitistic regressions. This is part of what is called pseudo-R-squared values for evaluation tests.
In this case, the McFadden value is 0.572, which is analgous to the coefficient of determination R\(2\), only about 5.7% of the variations in y is explained by the explanatory variables in the model.